Scaling Up Context-Sensitive Text Correction

نویسندگان

  • Andrew J. Carlson
  • Jeffrey Rosen
  • Dan Roth
چکیده

The main challenge in an effort to build a realistic system with context-sensitive inference capabilities, beyond accuracy, is scalability. This paper studies this problem in the context of a learning-based approach to context sensitive text correction – the task of fixing spelling errors that result in valid words, such as substituting to for too, casual for causal, and so on. Research papers on this problem have developed algorithms that can achieve fairly high accuracy, in many cases over 90%. However, this level of performance is not sufficient for a large coverage practical system since it implies a low sentence level performance. We examine and offer solutions to several issues relating to scaling up a context sensitive text correction system. In particular, we suggest methods to reduce the memory requirements while maintaining a high level of performance and show that this can still allow the system to adapt to new domains. Most important, we show how to significantly increase the coverage of the system to realistic levels, while providing a very high level of performance, at the 99% level.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Context-Sensitive Word Distance by Adaptive Scaling of a Semantic Space

This paper proposes a computationally feasible method for measuring the context-sensitive semantic distance between words. The distance is computed by adaptive scaling of a semantic space. In the semantic space, each word in the vocabulary V is represented by a multidimensional vector which is extracted from an English dictionary through principal component analysis. Given a word set C which sp...

متن کامل

Context-Sensitive Measurement of Word Distance by Adaptive Scaling of a Semantic Space

The paper proposes a computationally feasible method for measuring contextsensitive semantic distance between words. The distance is computed by adaptive scaling of a semantic space. In the semantic space, each word in the vocabulary V is represented by a multidimensional vector which is obtained from an English dictionary through a principal component analysis. Given a word set C which specifi...

متن کامل

Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings

We present an unsupervised context-sensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings. Our method generates misspelling replacement candidates and ranks them according to their semantic fit, by calculating a weighted cosine similarity between the vectorized representation of a candidate and the misspelling context. To tune the parameters o...

متن کامل

Turning Quantitative: An Analytic Scale to Do Critical Discourse Analysis

Critical Discourse Analysis (CDA) could be seen as a theory in qualitative more than in qualitative stud- ies. This might have led to difficulty in doing CDA. Accordingly, this study attempted to develop a quan- titative profile in the form of an analytic rubric. For this purpose, Fairclough’s model of CDA was select- ed as the research framework. The techniques used for structuring analy...

متن کامل

Language Models for Contextual Error Detection and Correction

The problem of identifying and correcting confusibles, i.e. context-sensitive spelling errors, in text is typically tackled using specifically trained machine learning classifiers. For each different set of confusibles, a specific classifier is trained and tuned. In this research, we investigate a more generic approach to context-sensitive confusible correction. Instead of using specific classi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001